% !TEX root = ../sutton_learning_1988.tex
\begin{center}
  \textsc{Abstract}
\end{center}
%
\noindent
%
This article introduces a class of incremental learning procedures
specialized for prediction---that is,
for using past experience with an incompletely known system to
predict its future behavior.
Whereas conventional prediction-learning methods assign credit by
means of the difference between predicted and actual outcomes,
the new methods assign credit by means of the difference between
\emph{temporally successive predictions}.
Although such \emph{temporal-difference methods}~have been used in
Samuel's checker player,
Holland's bucket brigade,
and the author's Adaptive Heuristic Critic,
they have remained poorly understood.
Here we prove their convergence and optimality for special cases and
relate them to supervised-learning methods.
For most real-world prediction problems,
temporal-difference methods require less memory and
less peak computation than conventional methods;
\emph{and}~they produce more accurate predictions.
We argue that
most problems to which supervised learning is currently applied are
really prediction problems of the sort to which
temporal difference methods can be applied to advantage.

\vspace{1cm}

\noindent \textbf{Keywords:}
Incremental learning,
prediction,
connectionism,
credit assignment,
evaluation functions
